LaVAN: Localized and Visible Adversarial Noise
نویسندگان
چکیده
Most works on adversarial examples for deeplearning based image classifiers use noise that, while small, covers the entire image. We explore the case where the noise is allowed to be visible but confined to a small, localized patch of the image, without covering any of the main object(s) in the image. We show that it is possible to generate localized adversarial noises that cover only 2% of the pixels in the image, none of them over the main object, and that are transferable across images and locations, and successfully fool a stateof-the-art Inception v3 model with very high success rates. Padlock (92.7%) Tiger Cat (94.4%) Car Mirror (94.5%) Stingray (90.5%) 1. Adversarial Noise Deep neural-network architectures achieve remarkable results on image classification tasks. However, they are also susceptible to being fooled by adversarial examples: input Department of Computer Science, Bar-Ilan University,Ramat Gan, Israel. DeepMind, London, UK.. Correspondence to: Danny Karmon , Yoav Goldberg , Daniel Zoran . instances which were modified in a particular way, and as a result, are misclassified by the network. Of course, for the adversarial example to be interesting, the change should be such that it does not confuse a human looking at the picture. Beyond the clear security implications, adversarial examples are also interesting as they may provide insights into the strengths, weaknesses, and blind-spots of these ubiquitous state-of-the-art classification models. Most work on generating adversarial examples (we provide a more detailed review in section 5) focus either on noise which—while being imperceptible to humans—covers the entire image (Goodfellow et al., 2015; Szegedy et al., 2014), or on visible noise that covers prominent features of the main object in the image in a “natural” way (i.e., glasses with a specific pattern around a person’s eyes in a face identification task (Sharif et al., 2016)). In contrast, we look at visible noise that is localized to a small area of the image (a bounded box with up to 2% of the pixels), and which does not cover the main object in the image. Figure 1 shows examples of such noised images that are misclassified by a state-of-the-art Inception V3 network with very high confidence. A recent work by Brown et al (Brown et al., 2017) introduces a visible noise similar to ours. The works are complementary to a large extent. Their work focuses on the security implications and attempts to generate universal noise “patches” that can be physically printed and put on any image, in either a black-box (when the attacked network is unknown) or white-box (when the attacked network is known) setup. As a consequence, the resulting adversarial patches in (Brown et al., 2017) are relatively large (in a white-box setup, the generated noise has to cover about 10% of the image to be effective in about 90% of the tested conditions, and a disguised patch has to cover about 35% of the image for a similar result) and also visually resemble the target class to some extent. We do not attempt to produce a physical attack and are more interested in investigating the blind-spots of state-of-the-art image classifiers, and the kinds of noise that can cause them to misclassify. We experiment with two setups: working in the Network Domain and in the Image Domain. In the network-domain case, the noise is allowed to take any value and is not restricted to the dynamic range of images. Such noise is akin to shining a very bright flash-light into someone’s eye. In the ar X iv :1 80 1. 02 60 8v 2 [ cs .C V ] 1 M ar 2 01 8 LaVAN: Localized and Visible Adversarial Noise Quail (99.8%) → Spiny Lobster (94.6%) Conch (99.4%) → Go-Kart (98.1%) Lifeboat (89.2%) → Scotch Terrier (99.8%) Submarine (98.9%) → Bonnet (99.1%) Figure 1. Images with network-domain localized noise. The Noise was generated for a specific input and location. Less than 2% of pixels are noised. Network-domain noises are scaled to image-domain for presentation. image-domain case the noise is kept to the dynamic-range of images. We show that in a white-box setting, we can generate localized visible noise that can be transferred to almost arbitrary images, covers only up to 2% of the image, does not cover any part of the main object in the image and yet manages to make the network misclassify with very high confidence. This works both for the network-domain and the imagedomain case, although the success rates are naturally higher for the network-domain. Moreover, by inspecting the gradients of the network over noised images, we show that the network does not usually identify the noised patch as the main cause of misclassification, and in some cases hardly assigns it any blame at all. The latter is true even in network-domain case. This is in contrast to the hypothesis posed in (Brown et al., 2017), in which the noise is said to be “much more salient” to the neural network than real-world objects. The localized noises we generate are universal in the sense that they can be applied to many different images and locations. However, they are specific to a model they were trained on (i.e., equivalent to the white-box setups in (Brown et al., 2017)). We believe these results highlight an interesting blind-spot in current state-of-the-art network architectures. 2. Localized noise for a single image and location In the first setup, we explore generating a visible but localized adversarial noise that is specific to a single image and location within this image. 2.1. Setting and Method Our method mostly follows that standard adversarial noise generation setup: we assume access to a trained model M that assigns membership probabilities pM (y|x) to input images x ∈ Rn=w×h×c. We denote by ~y = pM (x) the vector of all class probabilities, and by y = argmaxy′ pM (y = y′|x) the highest scoring class for input x (the classifier’s prediction). Let ysource be the classifier’s prediction on input x (the source class). We seek an image x′ that is classified by the network as ytarget (the target class). The image x′ is composed of the original image with an additive noise δ ∈ R: x′ = x+ δ. This is cast an optimization problem, seeking a value δ to to maximize pM (y = ytarget|x+ δ). The noise δ can be found using a stochastic gradient based algorithm. We depart from this standard methodology by: 1. We want the noise δ to be confined to a small area over the image x, and to replace this area rather than be LaVAN: Localized and Visible Adversarial Noise added to it. This is achieved by setting a mask m ∈ {0, 1}, and taking the noised image to be (1−m) x−m δ, where is element-wise multiplication. 2. Instead of training the noise to either maximize the probability of the target class or to minimize the probability of any other class (including the source class), we use a loss that does both things—it attempts to move the prediction towards the target class and away from the highest scored class. We use the network activations prior to the final softmax layer, denoted as M(x), M(y = y′|x). This decouples the outputs for the different classes, and speeds up convergence. argmax δ [M(y = ytarget|(1−m) x+m δ) −M(y = ysource|(1−m) x+m δ)]
منابع مشابه
Generating High Quality Visible Images from SAR Images Using CNNs
We propose a novel approach for generating high quality visible-like images from Synthetic Aperture Radar (SAR) images using Deep Convolutional Generative Adversarial Network (GAN) architectures. The proposed approach is based on a cascaded network of convolutional neural nets (CNNs) for despeckling and image colorization. The cascaded structure results in faster convergence during training and...
متن کاملبررسی آلودگی صوتی در منطقه نفتی لاوان و تعیین اثر محصور سازی منابع مولد صدا بر کاهش تراز فشار صدا
Background and aims Overexposure to industrial noise pollution induce hearing loss workers. Occupational hearing loss may cause interference whit oral communication, so it may increase the risk of occupational accidents in workplace as well as affects whit social activities. This study was conducted on Lavan Island, are of oil extracting regions in the south of Iran. The object of this s...
متن کاملDelving into adversarial attacks on deep policies
Adversarial examples have been shown to exist for a variety of deep learning architectures. Deep reinforcement learning has shown promising results on training agent policies directly on raw inputs such as image pixels. In this paper we present a novel study into adversarial attacks on deep reinforcement learning polices. We compare the effectiveness of the attacks using adversarial examples vs...
متن کاملVisible Progress on Adversarial Images and a New Saliency Map
Many machine learning classifiers are vulnerable to adversarial perturbations. An adversarial perturbation modifies an input to change a classifier’s prediction without causing the input to seem substantially different to human perception. We deploy three methods to detect adversarial images. Adversaries trying to bypass our detectors must make the adversarial image less pathological or they wi...
متن کاملGANosaic: Mosaic Creation with Generative Texture Manifolds
This paper presents a novel framework for generating texture mosaics with convolutional neural networks. Our method is called GANosaic and performs optimization in the latent noise space of a generative texture model, which allows the transformation of a content image into a mosaic exhibiting the visual properties of the underlying texture manifold. To represent that manifold, we use a state-of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.02608 شماره
صفحات -
تاریخ انتشار 2018